On The Direct Maximization of Quadratic Weighted Kappa
نویسندگان
چکیده
In recent years, quadratic weighted kappa has been growing in popularity in the machine learning community as an evaluation metric in domains where the target labels to be predicted are drawn from integer ratings, usually obtained from human experts. For example, it was the metric of choice in several recent, high profile machine learning contests hosted on Kaggle : www.kaggle.com/c/asap-aes , www.kaggle.com/c/asap-sas , www.kaggle.com/c/diabeticretinopathy-detection . Yet, little is understood about the nature of this metric, its underlying mathematical properties, where it fits among other common evaluation metrics such as mean squared error (MSE) and correlation, or if it can be optimized analytically, and if so, how. Much of this is due to the cumbersome way that this metric is commonly defined. In this paper we first derive an equivalent but much simpler, and more useful, definition for quadratic weighted kappa, and then employ this alternate form to address the above issues. 1. Preliminaries Although first developed in the statistical community as a measure of inter-rater agreement, κ has more recently become a popular performance metric in supervised machine learning, specifically in situations where the target (dependent) variable y is a discrete, interval variable (usually drawn from non-negative integers) such as is common in most human rating scales (e.g.“on a scale from 1 to 10”). This differs from the ordinal regression setting, where there only exists an ordering over labels, but no intrinsic or constant length interval between them. Some have argued that the use of quadratic weighted kappa as a metric in the domain of human ratings imposes the erroneous assumption of “equal intervals” where there should be none (how this assumption is expressed in the metric itself will be made clear in the following section). For example, when rating student essays on a scale from 1 to 5, the difference between a 1 and a 2 may not be equal to the difference between a 4 and a 5. While this may or may not be true in certain cases, we will not be concerned with that here. 1.1. Standard Definition Quadratic weighted kappa, which we write κ to distinguish from linear weighted kappa, was originally developed as a measure of inter-rater agreement. In this scenario, there are two raters, A and B, each associated with a vector of n integer ratings a,b ∈ Ln×1 where L = {1, 2, · · · , `} is a finite set of ` possible values. We seek to quantify the level of agreement between a and b. In order to compute κ(a,b), it is customary to start by computing frequency tables. The observed confusion matrix U = (ui,j) ∈ N`×` is first computed as:
منابع مشابه
Nurse-Physician Agreement on Triage Category: A Reliability Analysis of Emergency Severity Index
Background and Objectives: MThe Emergency Severity Index (ESI) triage is commonly used in clinical settings to determine the patients’ emergency severity. However, the reliability of this index is not sufficiently explored. The present study examines the inter-rater reliability of ESI by comparing triage ratings as performed by nurses and physicians. Methods: This prospective cross-sectional st...
متن کاملA QUADRATIC MARGIN-BASED MODEL FOR WEIGHTING FUZZY CLASSIFICATION RULES INSPIRED BY SUPPORT VECTOR MACHINES
Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only considers both accuracy and generalization criteria in a single objective fu...
متن کاملSingle leg mini squat: an inter-tester reproducibility study of children in the age of 9–10 and 12–14 years presented by various methods of kappa calculation
BACKGROUND Multiple studies suggest that reduced postural orientation is a possible risk factor for both patello-femoral joint pain (PFP) and rupture of the anterior cruciate ligament (ACL). In order to prevent PFP and ACL injuries in adolescent athletes, it is necessary to develop simple and predictive screening tests to identify those at high risk. Single Leg Mini Squat (SLMS) is a functional...
متن کاملCohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables
Cohen’s weighted kappa is a popular descriptive statistic for measuring the agreement between two raters on an ordinal scale. Popular weights for weighted kappa are the linear weights and the quadratic weights. It has been frequently observed in the literature that the value of the quadratically weighted kappa is higher than the value of the linearly weighted kappa. In this paper this phenomeno...
متن کاملInter-rater Reliability of Triages Performed by the Electronic Triage System.
OBJECTIVE To examine the inter-rater reliability of triages performed by the Electronic Triage System (ETS) which has recently developed and used in hospital emergency department (ED). METHODS This cross-sectional study was conducted prospectively and studied 408 visitors of Tabriz Imam Reza hospital's ED. The variables of interest were age, sex, nurse-assigned triage category, physician-assi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1509.07107 شماره
صفحات -
تاریخ انتشار 2015